In [5]:
import pandas as pd
import networkx as nx
import os
import numpy as np

Tables to Networks, Networks to Tables

Networks can be represented in a tabular form in two ways: As an adjacency list with edge attributes stored as columnar values, and as a node list with node attributes stored as columnar values.

Storing the network data as a single massive adjacency table, with node attributes repeated on each row, can get unwieldy, especially if the graph is large, or grows to be so. One way to get around this is to store two files: one with node data and node attributes, and one with edge data and edge attributes.

The Divvy bike sharing dataset is one such example of a network data set that has been stored as such.

Loading Node Lists and Adjacency Lists

Let's use the Divvy bike sharing data set as a starting point. The Divvy data set is comprised of the following data:

  • Stations and metadata (like a node list with attributes saved)
  • Trips and metadata (like an edge list with attributes saved)

The README.txt file in the Divvy directory should help orient you around the data.


In [6]:
stations = pd.read_csv('datasets/divvy_2013/Divvy_Stations_2013.csv', parse_dates=['online date'], index_col='id')
stations


Out[6]:
name latitude longitude dpcapacity landmark online date
id
5 State St & Harrison St 41.873958 -87.627739 19 30 2013-06-28
13 Wilton Ave & Diversey Pkwy 41.932500 -87.652681 19 66 2013-06-28
14 Morgan St & 18th St 41.858086 -87.651073 15 163 2013-06-28
15 Racine Ave & 18th St 41.858181 -87.656487 15 164 2013-06-28
16 Wood St & North Ave 41.910329 -87.672516 15 223 2013-08-12
17 Wood St & Division St 41.903320 -87.672730 15 246 2013-06-28
19 Loomis St & Taylor St 41.869417 -87.660996 15 139 2013-06-28
20 Sheffield Ave & Kingsbury St 41.909592 -87.653497 15 154 2013-06-28
21 Aberdeen St & Jackson Blvd 41.877726 -87.654787 15 157 2013-06-28
22 May St & Taylor St 41.869482 -87.655486 15 160 2013-06-28
23 Orleans St & Elm St 41.902924 -87.637715 15 172 2013-06-28
24 Fairbanks Ct & Grand Ave 41.891860 -87.620620 15 262 2013-06-28
25 Michigan Ave & Pearson St 41.897660 -87.623510 23 34 2013-06-28
26 McClurg Ct & Illinois St 41.891020 -87.617300 23 51 2013-06-28
27 Larrabee St & North Ave 41.910210 -87.643500 19 174 2013-06-28
28 Larrabee St & Menomonee St 41.914680 -87.643320 15 282 2013-06-28
29 Noble St & Milwaukee Ave 41.900680 -87.662600 15 290 2013-06-28
30 Ashland Ave & Augusta Blvd 41.899643 -87.667700 15 248 2013-06-28
31 Franklin St & Chicago Ave 41.896802 -87.635638 23 17 2013-06-28
32 Racine Ave & Congress Pkwy 41.874640 -87.657030 19 76 2013-06-28
33 State St & Van Buren St 41.877181 -87.627844 27 3 2013-06-28
34 Cannon Dr & Fullerton Ave 41.926756 -87.634429 15 124 2013-06-28
35 Streeter Dr & Illinois St 41.891071 -87.612200 35 22 2013-08-05
36 Franklin St & Jackson Blvd 41.877708 -87.635321 27 19 2013-06-28
37 Dearborn St & Adams St 41.879356 -87.629791 19 20 2013-06-28
42 Wabash Ave & Cermak Rd 41.853239 -87.625337 15 170 2013-06-28
43 Michigan Ave & Washington St 41.883893 -87.624649 43 1 2013-06-28
44 State St & Randolph St 41.884730 -87.627734 27 2 2013-06-28
45 Michigan Ave & Congress Pkwy 41.876066 -87.624433 15 40 2013-06-28
46 Wells St & Walton St 41.899930 -87.634430 19 46 2013-06-28
... ... ... ... ... ... ...
322 Kimbark Ave & 53rd St 41.799568 -87.594747 15 397 2013-09-21
323 Sheridan Rd & Lawrence Ave 41.969517 -87.654691 15 384 2013-09-23
324 Stockton Dr & Wrightwood Ave 41.931320 -87.638742 15 276 2013-10-03
325 Clark St & Winnemac Ave 41.973385 -87.668365 15 392 2013-09-23
326 Clark St & Leland Ave 41.967096 -87.667429 11 239 2013-09-27
327 Sheffield Ave & Webster Ave 41.921687 -87.653714 19 188 2013-09-25
328 Ellis Ave & 58th St 41.788746 -87.601334 15 365 2013-09-25
329 Lake Shore Dr & Diversey Pkwy 41.932684 -87.636250 15 347 2013-09-25
330 Lincoln Ave & Addison St 41.946176 -87.673308 19 77 2013-09-26
331 Halsted St & Blackhawk St 41.908540 -87.648568 19 176 2013-09-28
332 Halsted St & Diversey Pkwy 41.933341 -87.648747 15 208 2013-09-28
333 Ashland Ave & Blackhawk St 41.907066 -87.667252 15 224 2013-09-27
334 Lake Shore Dr & Belmont Ave 41.940775 -87.639192 19 233 2013-09-27
335 Calumet Ave & 35th St 41.831379 -87.618034 15 345 2013-10-17
336 Cottage Grove Ave & 47th St 41.809855 -87.606755 15 422 2013-10-17
337 Clark St & Chicago Ave 41.896544 -87.630931 19 303 2013-09-27
338 Calumet Ave & 18th St 41.857611 -87.619407 15 102 2013-09-28
339 Emerald Ave & 31st St 41.838198 -87.645143 11 404 2013-09-28
340 Clark St & Wrightwood Ave 41.929546 -87.643118 15 209 2013-10-24
341 Adler Planetarium 41.866095 -87.607267 19 431 2013-10-09
342 Wolcott Ave & Polk St 41.871262 -87.673688 15 284 2013-10-12
343 Racine Ave & Wrightwood Ave 41.928887 -87.658971 15 297 2013-10-24
344 Wolcott Ave & Lawrence Ave 41.968641 -87.676335 15 26 2013-10-09
345 Lake Park Ave & 56th St 41.793242 -87.587782 15 119 2013-10-09
346 Ada St & Washington Blvd 41.882830 -87.661206 15 353 2013-10-10
347 Ashland Ave & Grace St 41.950687 -87.668700 15 319 2013-10-12
348 California Ave & 21st St 41.854016 -87.695445 15 96 2013-10-14
349 Halsted St & Wrightwood Ave 41.929143 -87.649077 15 210 2013-10-28
350 Ashland Ave & Chicago Ave 41.895966 -87.667747 15 247 2013-10-22
351 Cottage Grove Ave & 51st St 41.803038 -87.606615 15 440 2013-10-17

300 rows × 6 columns


In [7]:
trips = pd.read_csv('datasets/divvy_2013/Divvy_Trips_2013.csv', parse_dates=['starttime', 'stoptime'], index_col=['trip_id'])
trips = trips.sort()
trips


/Users/ericmjl/anaconda/envs/network_tutorial/lib/python3.4/site-packages/pandas/io/parsers.py:1170: DtypeWarning: Columns (10) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)
Out[7]:
starttime stoptime bikeid tripduration from_station_id from_station_name to_station_id to_station_name usertype gender birthday
trip_id
3940 2013-06-27 01:06:00 2013-06-27 09:46:00 914 31177 91 Clinton St & Washington Blvd 48 Larrabee St & Kingsbury St Subscriber Male 1982
4095 2013-06-27 12:06:00 2013-06-27 12:11:00 480 301 85 Michigan Ave & Oak St 85 Michigan Ave & Oak St Subscriber Male 1982
4113 2013-06-27 11:09:00 2013-06-27 11:11:00 711 140 88 May St & Randolph St 88 May St & Randolph St Subscriber Male 1982
4118 2013-06-27 12:11:00 2013-06-27 12:16:00 480 316 85 Michigan Ave & Oak St 28 Larrabee St & Menomonee St Customer NaN NaN
4119 2013-06-27 11:12:00 2013-06-27 11:13:00 711 87 88 May St & Randolph St 88 May St & Randolph St Subscriber Male 1982
4134 2013-06-27 11:24:00 2013-06-27 14:38:00 145 11674 17 Wood St & Division St 61 Wood St & Milwaukee Ave Subscriber Male 1978
4162 2013-06-27 11:39:00 2013-06-27 16:01:00 711 15758 88 May St & Randolph St 34 Cannon Dr & Fullerton Ave Subscriber Male 1982
4192 2013-06-27 12:15:00 2013-06-27 12:16:00 303 60 28 Larrabee St & Menomonee St 28 Larrabee St & Menomonee St Subscriber Male 1982
4216 2013-06-27 13:00:00 2013-06-27 13:03:00 907 171 45 Michigan Ave & Congress Pkwy 90 Millennium Park Subscriber Male 1982
4255 2013-06-27 13:18:00 2013-06-27 19:34:00 907 22549 45 Michigan Ave & Congress Pkwy 54 Ogden Ave & Chicago Ave Subscriber Male 1982
4263 2013-06-27 14:39:00 2013-06-27 14:40:00 145 62 61 Wood St & Milwaukee Ave 300 Broadway & Barry Ave Subscriber Male 1978
4275 2013-06-27 14:44:00 2013-06-27 14:45:00 77 64 32 Racine Ave & Congress Pkwy 32 Racine Ave & Congress Pkwy Customer NaN NaN
4288 2013-06-27 14:56:00 2013-06-27 14:57:00 524 66 68 Clinton St & Tilden St 68 Clinton St & Tilden St Subscriber Male 1983
4289 2013-06-27 14:57:00 2013-06-27 15:05:00 78 487 32 Racine Ave & Congress Pkwy 349 Halsted St & Wrightwood Ave Subscriber Female 1980
4291 2013-06-27 14:58:00 2013-06-27 15:05:00 77 433 32 Racine Ave & Congress Pkwy 19 Loomis St & Taylor St Customer NaN NaN
4316 2013-06-27 15:06:00 2013-06-27 15:09:00 77 123 19 Loomis St & Taylor St 19 Loomis St & Taylor St Customer NaN NaN
4342 2013-06-27 15:13:00 2013-06-27 15:27:00 77 852 19 Loomis St & Taylor St 55 Halsted St & James M Rochford St Customer NaN NaN
4343 2013-06-27 15:09:00 2013-06-27 15:14:00 587 272 68 Clinton St & Tilden St 68 Clinton St & Tilden St Subscriber Male 1983
4345 2013-06-27 16:14:00 2013-06-27 16:15:00 711 83 34 Cannon Dr & Fullerton Ave 311 Lincoln Ave & Eastwood Ave Subscriber Male 1982
4346 2013-06-27 15:14:00 2013-06-28 00:18:00 145 32646 61 Wood St & Milwaukee Ave 91 Clinton St & Washington Blvd Subscriber Male 1978
4350 2013-06-27 15:15:00 2013-06-27 15:27:00 78 730 349 Halsted St & Wrightwood Ave 55 Halsted St & James M Rochford St Subscriber Female 1980
4378 2013-06-27 15:54:00 2013-06-27 23:15:00 524 26479 320 Loomis St & Lexington St 27 Larrabee St & North Ave Subscriber Male 1983
4384 2013-06-27 16:42:00 2013-06-27 22:23:00 78 20457 55 Halsted St & James M Rochford St 31 Franklin St & Chicago Ave Subscriber Female 1980
4389 2013-06-27 16:34:00 2013-06-27 16:36:00 66 127 42 Wabash Ave & Cermak Rd 42 Wabash Ave & Cermak Rd Subscriber Female 1980
4390 2013-06-27 16:41:00 2013-06-27 16:50:00 66 528 42 Wabash Ave & Cermak Rd 42 Wabash Ave & Cermak Rd Subscriber Female 1980
4415 2013-06-27 18:21:00 2013-06-27 18:37:00 384 973 36 Franklin St & Jackson Blvd 36 Franklin St & Jackson Blvd Subscriber Male 1967
4427 2013-06-27 18:32:00 2013-06-27 18:37:00 348 308 81 Daley Center Plaza 81 Daley Center Plaza Subscriber Male 1971
4476 2013-06-27 18:38:00 2013-06-27 19:35:00 152 3401 81 Daley Center Plaza 67 Sheffield Ave & Fullerton Ave Subscriber Male 1964
4480 2013-06-27 19:40:00 2013-06-27 22:28:00 27 10105 340 Clark St & Wrightwood Ave 46 Wells St & Walton St Customer NaN NaN
4490 2013-06-27 18:45:00 2013-06-27 19:03:00 418 1094 37 Dearborn St & Adams St 76 Lake Shore Dr & Monroe St Customer NaN NaN
... ... ... ... ... ... ... ... ... ... ... ...
1109177 2013-12-31 19:34:00 2013-12-31 20:01:00 825 1632 289 Wells St & Concord Ln 165 Clark St & Waveland Ave Subscriber Male 1985
1109198 2013-12-31 19:42:00 2013-12-31 19:51:00 2181 524 214 Damen Ave & Grand Ave 285 Wood St & Grand Ave Subscriber Female 1979
1109199 2013-12-31 19:42:00 2013-12-31 19:51:00 1978 523 214 Damen Ave & Grand Ave 285 Wood St & Grand Ave Subscriber Male 1976
1109201 2013-12-31 19:44:00 2013-12-31 20:01:00 1304 1003 158 Milwaukee Ave & Wabansia Ave 214 Damen Ave & Grand Ave Customer NaN NaN
1109202 2013-12-31 19:55:00 2013-12-31 20:07:00 188 712 304 Halsted St & Waveland Ave 251 Clarendon Ave & Leland Ave Subscriber Male 1969
1109203 2013-12-31 20:02:00 2013-12-31 20:11:00 171 548 165 Clark St & Waveland Ave 234 Clark St & Montrose Ave Subscriber Male 1985
1109222 2013-12-31 20:12:00 2013-12-31 20:25:00 2484 804 100 Orleans St & Merchandise Mart Plaza 75 Canal St & Jackson Blvd Subscriber Male 1964
1109223 2013-12-31 20:30:00 2013-12-31 20:37:00 435 390 234 Clark St & Montrose Ave 318 Southport Ave & Irving Park Rd Subscriber Female 1981
1109224 2013-12-31 20:31:00 2013-12-31 20:38:00 2676 387 305 Western Ave & Division St 130 Damen Ave & Division St Subscriber Male 1984
1109231 2013-12-31 20:38:00 2014-01-01 12:08:00 2676 55776 130 Damen Ave & Division St 230 Lincoln Ave & Roscoe St Subscriber Male 1984
1109233 2013-12-31 20:43:00 2013-12-31 20:51:00 1400 470 118 Sedgwick St & North Ave 138 Clybourn Ave & Division St Subscriber Male 1972
1109240 2013-12-31 20:51:00 2013-12-31 21:05:00 36 819 141 Clark St & Lincoln Ave 181 LaSalle St & Illinois St Subscriber Male 1972
1109256 2013-12-31 21:07:00 2013-12-31 21:10:00 1917 209 60 Dayton St & North Ave 20 Sheffield Ave & Kingsbury St Subscriber Female 1983
1109257 2013-12-31 21:12:00 2013-12-31 21:16:00 1917 284 20 Sheffield Ave & Kingsbury St 93 Sheffield Ave & Willow St Subscriber Female 1983
1109275 2013-12-31 21:36:00 2013-12-31 22:00:00 2799 1421 227 Southport Ave & Waveland Ave 228 Damen Ave & Melrose Ave Subscriber Male 1984
1109276 2013-12-31 21:36:00 2013-12-31 22:01:00 2859 1493 227 Southport Ave & Waveland Ave 228 Damen Ave & Melrose Ave Subscriber Female 1978
1109277 2013-12-31 21:36:00 2013-12-31 22:01:00 2642 1447 227 Southport Ave & Waveland Ave 228 Damen Ave & Melrose Ave Subscriber Male 1986
1109278 2013-12-31 21:38:00 2013-12-31 22:00:00 2316 1309 227 Southport Ave & Waveland Ave 228 Damen Ave & Melrose Ave Subscriber Male 1989
1109279 2013-12-31 21:50:00 2013-12-31 22:07:00 1667 987 119 Ashland Ave & Lake St 275 Ashland Ave & 13th St Subscriber Male 1977
1109280 2013-12-31 21:55:00 2013-12-31 22:04:00 171 573 234 Clark St & Montrose Ave 254 Pine Grove Ave & Irving Park Rd Customer NaN NaN
1109283 2013-12-31 22:03:00 2013-12-31 22:13:00 198 650 284 Michigan Ave & Jackson Blvd 43 Michigan Ave & Washington St Subscriber Female 1976
1109308 2013-12-31 22:10:00 2013-12-31 22:16:00 2931 353 130 Damen Ave & Division St 69 Damen Ave & Pierce Ave Subscriber Male 1972
1109309 2013-12-31 22:10:00 2013-12-31 22:16:00 2048 346 130 Damen Ave & Division St 69 Damen Ave & Pierce Ave Subscriber Female 1976
1109310 2013-12-31 22:12:00 2013-12-31 22:16:00 347 248 340 Clark St & Wrightwood Ave 300 Broadway & Barry Ave Subscriber Male 1989
1109331 2013-12-31 22:20:00 2013-12-31 22:26:00 347 357 300 Broadway & Barry Ave 117 Wilton Ave & Belmont Ave Subscriber Male 1989
1109336 2013-12-31 22:29:00 2013-12-31 22:35:00 788 368 169 Canal St & Harrison St 77 Clinton St & Madison St Subscriber Male 1952
1109338 2013-12-31 22:35:00 2013-12-31 22:49:00 2239 868 216 California Ave & Division St 69 Damen Ave & Pierce Ave Subscriber Female 1978
1109369 2013-12-31 23:07:00 2013-12-31 23:31:00 1536 1492 22 May St & Taylor St 22 May St & Taylor St Subscriber Male 1985
1109392 2013-12-31 23:36:00 2013-12-31 23:46:00 2069 600 120 Wentworth Ave & Archer Ave 135 Halsted St & 21st St Subscriber Male 1984
1109397 2013-12-31 23:46:00 2013-12-31 23:52:00 2097 316 206 Halsted St & Archer Ave 339 Emerald Ave & 31st St Subscriber Male 1963

759788 rows × 11 columns

At this point, we have our stations and trips data loaded into memory.

How we construct the graph depends on the kind of questions we want to answer, which makes the definition of the "unit of consideration" (or the entities for which we are trying to model their relationships) is extremely important.

Let's try to answer the question: "What are the most popular trip paths?" In this case, the bike station is a reasonable "unit of consideration", so we will use the bike stations as the nodes.

To start, let's initialize an directed graph G.


In [8]:
G = nx.DiGraph()

Then, let's iterate over the stations DataFrame, and add in the node attributes.


In [9]:
for r, d in stations.iterrows(): # call the pandas DataFrame row-by-row iterator
    G.add_node(r, attr_dict=d.to_dict())

In order to answer the question of "which stations are important", we need to specify things a bit more. Perhaps a measure such as betweenness centrality or degree centrality may be appropriate here.

The naive way would be to iterate over all the rows. Go ahead and try it at your own risk - it may take a long time :-). Alternatively, I would suggest doing a pandas groupby.


In [ ]:
# # Run the following code at your own risk :)
# for r, d in trips.iterrows():
#     start = d['from_station_id']
#     end = d['to_station_id']
#     if (start, end) not in G.edges():
#         G.add_edge(start, end, count=1)
#     else:
#         G.edge[start][end]['count'] += 1

In [ ]:
for (start, stop), d in trips.groupby(['from_station_id', 'to_station_id']):
    G.add_edge(start, stop, count=len(d))

First off, let's figure out how dense the graph is. The graph's density is the number of edges divided by the total number of nodes.

NetworkX provides an implementation of graph density, but it assumes self-loops are not allowed. (Self-loops are edges from one node to itself.) Let's see what the graph density is


In [ ]:
G.edges(data=True)

Applying what we learned earlier on, let's use the betweenness centrality metric.


In [ ]:
centralities = nx.betweenness_centrality(G, weight='count')

In [ ]:
sorted(centralities.items(), key=lambda x:x[1], reverse=True)

In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.bar(centralities.keys(), centralities.values())

Applying what we learned earlier, let's use the "degree centrality" metric as well.


In [ ]:
decentrality = nx.degree_centrality(G)
plt.bar(decentrality.keys(), decentrality.values())

The code above should have demonstrated to you the basic logic behind storing graph data in a human-readable format. For the richest data format, you can store a node list with attributes, and an edge list (a.k.a. adjacency list) with attributes.

Saving NetworkX Graph Files

NetworkX's API offers many formats for storing graphs to disk. If you intend to work exclusively with NetworkX, then pickling the file to disk is probably the easiest way.

To write to disk:

nx.write_gpickle(G, handle)

To load from disk:

G = nx.read_gpickle(handle)

Let's write the graph to disk so that we can analyze it further in other notebooks.


In [ ]:
nx.write_gpickle(G, 'datasets/divvy_2013/divvy_graph.pkl')

In [ ]: